Goto

Collaborating Authors

 grid search


A two-step sequential approach for hyperparameter selection in finite context models

Contente, José, Martins, Ana, Pinho, Armando J., Gouveia, Sónia

arXiv.org Machine Learning

Finite-context models (FCMs) are widely used for compressing symbolic sequences such as DNA, where predictive performance depends critically on the context length k and smoothing parameter α. In practice, these hyperparameters are typically selected through exhaustive search, which is computationally expensive and scales poorly with model complexity. This paper proposes a statistically grounded two-step sequential approach for efficient hyperparameter selection in FCMs. The key idea is to decompose the joint optimization problem into two independent stages. First, the context length k is estimated using categorical serial dependence measures, including Cramér's ν, Cohen's \k{appa} and partial mutual information (pami). Second, the smoothing parameter α is estimated via maximum likelihood conditional on the selected context length k. Simulation experiments were conducted on synthetic symbolic sequences generated by FCMs across multiple (k, α) configurations, considering a four-letter alphabet and different sample sizes. Results show that the dependence measures are substantially more sensitive to variations in k than in α, supporting the sequential estimation strategy. As expected, the accuracy of the hyperparameter estimation improves with increasing sample size. Furthermore, the proposed method achieves compression performance comparable to exhaustive grid search in terms of average bitrate (bits per symbol), while substantially reducing computational cost. Overall, the results on simulated data show that the proposed sequential approach is a practical and computationally efficient alternative to exhaustive hyperparameter tuning in FCMs.






f1c1592588411002af340cbaedd6fc33-Supplemental.pdf

Neural Information Processing Systems

Figure 2: These two graphs cannot be distinguished by 1-WL-test. The COMBINE step takes the result of AGGREGATE and the previous representation of current node asinput. Wereduce theFFN inner-layer dimension of4din [47] tod, which does not appreciably hurt the performance but significantly save the parameters. The embedding dropout ratio is set to 0.1 by default in many previous Transformer works[11,34]. The rest of hyper-parameters remain unchanged. Table 8 summarizes the hyper-parameters used for fine-tuning Graphormer on OGBGMolPCBA.




c182ec594f38926b7fcb827635b9a8f4-Supplemental-Conference.pdf

Neural Information Processing Systems

Let q(Y;Θ) and cK(Y,X) be two smooth, decomposable circuits that are compatible overY then computing their product as a circuit rΘ,K(X,Y) = q(Y;Θ) cK(Y,X) that is decomposable overY can be done inO(|q||c|). Letr(X,Y)beacircuitthat is smooth and decomposable and deterministic overY then for a configurationx its MAP state argmaxyr(x,y)canbecomputedintimeO(|r|). For our experiments we use standard compilation tools toobtain aconstraint circuit starting from a propositional logical formula in conjunctive normal form. We now illustrate step-by-step one example of such a compilation for a simple logical formula. Deterministic sum units representdisjoint solutions to the logical formula, meaning there exists distinct assignments, characterized by the children, that satisfy the logical constraint e.g.